Search Results: "sjr"

2 October 2008

Simon Richter: A design question

As some of you may or may not be aware, I'm hacking on a new build tool that has a purely descriptive language for project descriptions, with lots of sensible defaults. Right now, the only thing I really require users to do is to declare the name and type of the project, for example Program: foo This will just take all files in this directory that it understands, compile them, and link them to an executable called "foo" (or "foo.exe"). So far, so good. However, there are more complex examples, specifically those where multiple outputs are installed to different directories, for example in the case of libraries:

Library: foo

Public: foo.h, foo.c

I've omitted the library versioning stuff from the example, because it isn't necessary to understand the problem. This means "take all compilable files, link them to "libfoo.so" (resp. "foo.dll") and only mark the symbols defined in foo.c as exported (not listing code generating files will fall back to exporting everything); then install foo.h into /usr/include (or make it available as <foo.h> to dependent projects). Now, I'd like to add an optional parameter to place a "published" resource into a namespace. How that is defined is dependent on the type of the resource (exported symbols could be tagged with a symbol version, while include files would be installed into a subdirectory of /usr/include). And this is where I would like to get some input. How should such a declaration look like in the project description file? The file is in RFC822 style format, with one section per project (multiple projects in a single directory are permitted if you list the inputs explicitly), and I'd rather keep it that way. Ideas so far:

Public[foo]: foo.h Public[FOO_1_0]: foo.c

This means multiple lines and dividing .c and .h files since their "namespace" attributes mean something entirely different.

Public: foo.h [foo], foo.c [FOO_1_0]

This ends up becoming pretty repetitive if I have a lot of header files. Neither of these looks as "clean" as I'd like a new file format to look. I'm pretty much sure "built-in" tagging of certain things will be handled by separate top-level tags (similar to automake):

User-Public: userdoc.sgml

Developer-Public: devdoc.sgml

I'm open to criticism on this one as well though.

21 September 2008

Simon Richter: I need to change my morning routine

Reading other people's blogs right after getting up leads to public embarrassment. :-P

Take a picture of yourself right now.
Don t change your clothes, don t fix your hair just take a picture.
Post that picture with NO editing.
Post these instructions with your picture.

14 July 2008

Simon Richter: It's all in my head now

I've spent four hours listening to muzak yesterday. Apparently, antitrust legislation prohibits T-Online from getting better access to T-Com's ticket system than their competitors. Of course that means "none at all". I do however, disagree with their corollary that it is the customer's duty to call T-Com and tell them what is wrong. On a Sunday evening. After a lightning strike. With a Windows update breaking PPPoE for ZoneAlarm users. Seriously, get your processes straight.

7 July 2008

Simon Richter: Yay

Dammit has learned to arrange files into a tree. This is awesome, because

It needs to ignore "input" files that aren't really inputs, for example artefacts from builds with other tools
Loops are bad, but calling the same tool at multiple stages during the compilation process might be necessary (so we won't cheat like Microsoft did in MSBuild and use a strict order of build tools)
We really only need to look for the compiler once

Missing so far:

Handling of implicit dependencies, i.e. header files
Project-specific tools
Actually running the tools

Thus, no new upload yet; example tree below the fold.

8 June 2008

Simon Richter: Hardware abstraction

There is a bit of discussion on Planet Debian on dbus and HAL and how to handle the daemon restarting. Basically, our "hardware abstraction" layer is actually more like a policy layer: the kernel provides the real hardware abstraction, but its access control features are not fine grained enough, and there is no revocation of privileges. Right now, there are two systems in use: X11 for keyboard/video/mouse, and dbus/hal for the rest. X11 has no problems with access control, because any process that wants to talk to the X server is owned by the user sitting at that terminal, and when that user logged off, the X server got uninteresting for us. For other hardware, that model doesn't entirely work. KVM is unique in that it only affects one user, and there is a clear definition of that user going away. To summarize my desired policies:

Sound: is similar, but I often log out in the evening and have something playing while I fall asleep, so the policy on the sound device is that access is prolonged even after logging out.
USB drives: should be accessible to the user that plugged them in, and stay with them, even after they logged out, until they are disconnected. I'd like to have block level access to them and run my file system drivers in user space (so I can format a pen drive without administrative access to the machine). That does not apply to all USB drives, though, so I'd like to be able to "recognize" certain volumes and handle them differently, not so much as a security but as a safety measure.
Smart cards: should be like KVM, really. No one but me should ever be able to talk to my cards, because causing a DoS is just so easy — enter the wrong code three times, and I have a big problem. Obviously, that is part of the security that my card provides (if you want to upload Debian packages in my name, you have to steal my wallet and get my PIN right in the first three attempts).
USB phone handset: I'd like that phone to be able to receive calls anytime, even when no one is logged in, and the currently logged on user should be able to make a call (but not over my account).
Specialty hardware: like the JTAG adapter hanging off USB on my box need a different policy too.

Any of these (except for the smart card) might mean that the device could be connected and used for a very long time, longer than I can sit at the console. Unlike humans, who get only mildly annoyed if the display closes and applications need to be restarted, there are serious data consistency issues if connections break and need to be rebuilt — and the policy needs to allow for rebuilding the connection securely. If we want to allow for handing over a resource to a new instance of the server or the client, then the protocol needs to be aware of this, as we need to synchronize server and client state after a connection breakdown. It might be possible to create a session management layer for certain services (analogous to X11 toolkits that transparently handle rebuilding the widgets on restart without bothering the application about it), but it'd probably be difficult to do without a full protocol redesign, as data loss is simply not as acceptable as when talking to a human who can pick up the pieces and go from there.

Simon Richter: ChessML?

I have a massive headache right now and my chess board is buried somewhere in the mess that is my room, so I had a bit of trouble understanding damog's latest post, and it got me wishing for a way that my computer could help me here. Obviously, displaying chess boards has been possible for a really long time, but there is no common data format yet which could be used to express a state or movements in a way that would allow my browser to display them. Any XML experts out there who want to write a DTD?

17 May 2008

Simon Richter: Solutions

(tl;dr: if you have a few minutes, please add information here) Joss, the problem with the new package formats is that there is nothing that actually uses the additional information in a way that adds significant new functionality, so the net result of the change was that we throw away the information at a different layer in our software stack, and one of the interfaces got a lot more complicated in the process. One possible application would be a "poor man's patch tracking" inside the BTS, perhaps with a new state "fixed in patch". I can see two ways of implementing that:

by extending the interface of the "new" package formats that Debian,Ubuntu bug numbers are attached to the actual patch files and having the archive maintenance software extract and process that information (reject packages that add a patch for a bug without closing it in the changelog, notify the BTS), or
by leaving the package format untouched and simply adding a regex matching "Fixes: #nnnnnn" that is reported to the BTS as "we have added a patch", so the submitter is notified that the bug is gone for him/her; the bug is then closed in the changelog of the upload removing the patch.

The former approach also allows us to link to patches from BTS pages, which the latter doesn't, so there could be actual benefit here if we believe it is worth the additional complexity. (Update: Rapha l thinks it is. I like the idea of a package format with separate patches a lot more in this context than I did without it, but still my fear that it will actually be perceived as sanctioning large patchsets still remains.) About mandatory co-maintenance: the problem isn't "helping". We have plenty of people with commit access to packages they don't even remotely understand who are really helpful (not). The problem is that someone needs to actually read all the commit logs and understand what the changes do in this context. In most cases, that person or group would be upstream, not a DD. My first impression after reading the patch was "adding uninitialized data to the entropy pool is pointless/harmful as it is not random, so this patch makes sense", because the loop around it was not contained in the patch. Obviously I'm not an OpenSSL developer. There is nothing Debian could have done internally to verify the correctness of this patch that would properly scale to the entire archive, even if we put "more emphasis on security". The only solution I see is reporting every patch to upstream immediately and getting affirmation that it is correct. This, however means that we need to produce patches that upstream can accept. For obvious code bugs, that is simple, but for integration patches like paths it is not sufficient to replace one string with another, but rather make it configurable in some place that can be reached from debian/rules. In an ideal world, we end up with very few Debian specific patches, so essentially we are talking about adding functionality to dpkg that we don't want to use. I've started a page in the Debian Wiki, Getting Packaged with an outline of a possible document aimed at upstream developers that should list the typical problems we run into and how to avoid them.

Simon Richter: OLPC and Windows

There is an article on TechCrunch on Windows on the OLPC. This article started out as a comment below lots of comments that were missing the point, but eventually grew too large. The entire discussion circles around the question whether it would be beneficial to give the users the same view and behaviour that is on 90% of machines worldwide, so they can start out prospective jobs with a minimum of training. Learning your way around the UI is only a significant part of training if that actual work you will do is trivial — so this argument basically boils down to "I don't expect the African kids to do anything but grunt work during their lifetime anyway, so we better start training them early", which is the wrong approach not only to education. To make a bad car analogy, roads are usually made of several layers, from the foundation providing the stability up to the paint defining lanes. Operating systems are similarly layered, with a core that applications (cars) never touch directly, and several other layers on top of that that are not really required for basic functionality, but that add safety (process separation) or comfort (standard functions). The minimum standard of things is a "platform definition", which all car (or application) makers can expect — all roads have a minimum width and there are no dangerous spikes (if that is not true, you can get a steamroller or respectively format your harddisk). Railways use the same kind of foundation (operating system), but the platform (heh) is quite different. You cannot drive a car on a railway, or a train on a road, just as you cannot run a Windows application on a Linux system or vice versa (there are special wagons you can place your car on, and special trucks with rails on them if you feel like it, but these are heavier and need more energy to pull). Now in this discussion, people have been comparing Windows (the platform) to Linux (the operating system). That doesn't work. On Linux, there are several platforms available, the most prominent being GNOME and KDE for the desktop and POSIX utilities on the command line, but there are lots of others as well. Part of most platform definitions is an user interface, which abstracts what is really happening to something comprehensible to the user, using analogies (a tachometer displays our speed as an angle usually, but other representations are possible). The "desktop" idiom happened to be the first graphical UI some thirty years back, and was perpetuated into today's computers (just like the width of roads hasn't changed from the days of the Roman empire, where it was "two horses and then some"), however this doesn't mean it is the best choice available — it's just what we are used to. If you look at the screen contents on the day traders' computers (lots of that on the TV right now thanks to the market crisis), you will notice the vast majority does not use overlapping windows or standardized "rising-edge" buttons to click on, but rather, they have a tightly-packed grid layout with high-contrast information displays that also color-code certain messages. I think that is the most important point here: to achieve optimal results, the presentation idiom needs to be chosen in a task specific way. With children as the target audience, we lose one of the key requirements behind the adoption of the windowed view: the need for side-by-side presentation of data from multiple unrelated sources (which is also a problem given the lack of screen space). With the introduction of ad-hoc mesh networking and collaborative applications, the "desktop" analogy begins to break down. The project's mission also defines requirements on the platform. If we want to keep the requirement "users should be able to build and share their own stuff", then we want a framework where it is hard to make mistakes, especially those that can be spotted only after an interesting failure, and more importantly impossible to write code that makes unrelated components fail, because these components might be your way back out of the situation. Windows has an excellent event model with fairly good isolation of components (to the point where a problem in an event handler can be handled by the event loop rather than terminating the program, so for example Internet Explorer can shut down broken plugins rather than crashing), but the detail knowledge required to really work with the API (how to build a message loop that also runs queued I/O completion handlers correctly) leads to a fairly steep learning curve, and would teach implementation details rather than concepts. The normal "linuxy" approach of going low level whenever higher-level approaches fail is not the answer either as we want to truly empower people rather than just training them to be a cheap replacement for the tech support Indians (no offence), so it is vital that the "real" applications use the same framework that people implementing new things would use, and thus all the complexity that we want in our "official" applications needs to be taken care of by the platform, with all the safety features in place too. So no existing platform provides what we want. — hence Sugar. And that is the problem for Windows advocates: Sugar replaces those bits that make Windows a platform and not just a kernel, so porting Sugar to Windows doesn't make sense from a technical point of view, since we already replaced the bits that we didn't have free software for before. Other than that, the "Linux vs. Windows" kernel choice is secondary; in fact both kernels are very similar in design and function, the various advantages and disadvantages of either aren't that relevant really. The only technical reason in favour of Linux is the virtual memory management — the Windows VMM behaves erratically in the absence of a swap device, but I believe that is not something that cannot be fixed. The reason why I believe Linux is the better choice here is long-term support. Since these devices will be used in basic education (which hasn't changed that much in the past years as 1 plus 1 still equals 2), there is hardly any need for radical changes after the initial rollout — why add instability when you don't have to? With Microsoft as a for-profit company, there needs to be a business model sustaining that behind it, and I believe it will be very hard to find one. "Subscription" falls down in that it is a long-term recurring expense, which governments tend to be pretty wary of. The alternative is to upgrade several million computers' OS every few years. Lots of companies are skipping entire Windows releases because of the migration cost, and even with the "console bonus" (all hardware is the same) and bootloader support for software upgrades over a mesh network, this is still a massive endeavour. That each machine would have to reserve enough memory for the entire "upgrade pack" so it can transition "in one go" also makes this model unworkable. To summarize, using Windows on the OLPC does not make sense at all. If you use just the kernel, you don't gain anything over Linux, and if you use the entire platform (and by extension, the UI), you add unnecessary complexity that is not only not required for the actual task, but also distracting. If you add restrictions and extensions to make it work, you invent a new platform, which is precisely what Sugar did. The argument that it is important for pupils to use the same thing that the rest of the world is using to ease their entry into the workforce is bogus at best, and racist at worst.

2 May 2008

Simon Richter: Say...

... does anyone know what would be the expected behaviour if I returned a "multipart/related" document from an HTTP request?

Simon Richter: PDFs: doing it wrong

John writes about PDFs and the problem of embedded fonts that are missing due to restrictive licencing. Font designers usually distinguish between "embedding" and "using" a font, prohibiting the former in their licence agreement. The difference is that when a font is "used", all information about the text is lost, and only rendering instructions remain (usually a set of Bezier curves). PDF allows you to annotate a rendered page in various ways, among other things, it allows you to specify that a certain region should be interpreted as a specific text. For example, OCR tools working with PDF files start with a file containing just the image data, and add the textual information, so you end up with a file that looks like the scan, but has selectable text. While that is certainly not legal advice, so far it has been considered acceptable to ship a PDF file with pre-rendered text and an annotation that facilitates copy&paste (which you also need with embedded fonts to allow selections to see ligatures like " " as two distinct characters). These files may be slightly larger than they would be with an embedded font, but have the advantage of looking exactly the same everywhere and at the same time, being trivial to render as the rasterizer does not need to understand anything about text layout. tl;dr: if you have a document that people should be able to see exactly the way you intended it, please include the text curves if you may not embed the font directly, and provide selection hints.

Simon Richter: .deb versus .msi

I've learnt a lot about building MSI files during the last days. In a way, it is very similar to building Debian packages -- you hand some magic tool a directory with files and a directory with meta-information. The main difference is that MSI files cannot call arbitrary commands in their installation sequence, which serves an useful purpose: as long as they make do with the standard actions provided with the base installer, it is guaranteed that you can undo all changes, and you can get a full overview what the installation will do, without running any untrusted code. If the standard actions aren't enough (for example, they don't cover driver installation), you can ship a DLL that can be loaded into the installer. You lose a bit of the audit functionality there, but that is not a problem in practice as these plug-ins are signed separately from the package, so for every genuine problem, there is already a plug-in that you can simply "merge" to your package.